An Annotation Schema for Preposition Senses in German
نویسندگان
چکیده
Prepositions are highly polysemous. Yet, little effort has been spent to develop languagespecific annotation schemata for preposition senses to systematically represent and analyze the polysemy of prepositions in large corpora. In this paper, we present an annotation schema for preposition senses in German. The annotation schema includes a hierarchical taxonomy and also allows multiple annotations for individual tokens. It is based on an analysis of usage-based dictionaries and grammars and has been evaluated in an inter-annotatoragreement study. 1 Annotation Schemata for Preposition Senses: A Problem to be Tackled It is common linguistic wisdom that prepositions are highly polysemous. It is thus somewhat surprising that little attention has been paid to the development of specialized annotation schemata for preposition senses. 1 In the present paper, we present a tagset for the annotation of German prepositions. The need for an annotation schema emerged in an analysis of so-called PrepositionNoun Combinations (PNCs), sometimes called determinerless PPs or bare PPs. PNCs minimally consist of a preposition and a count noun in the singular that appear without a determiner. In (1), examples are given from German. (1) auf parlamentarische Anfrage (after being asked in parliament), bei absolut klarer Zielsetzung (given a clearly present aim), unter sanfter Androhung (under gentle threat) The preposition-sense annotation forms part of a larger annotation task of the corpus, where all 1 The Preposition Project is a notable exception (cf. www.clres.com/prepositions.html). relevant properties of PPs and PNCs receive either automated or manual annotations. In developing an annotation schema for preposition senses, we pursue two general goals: I. An annotation schema for preposition senses should provide a basis for manual annotation of a corpus to determine whether the interpretation of prepositions is a grammatical factor. II. The preposition sense annotations together with the other annotations of the corpus should serve as a reference for the automatic classification of preposition senses. With regard to the goals formulated, the present paper is an intermediate report. The annotation schema has been developed and the manual annotation of the corpus is well under way. The next logical steps will be to apply the annotations to a wider range of prepositions and eventually to use the annotated corpus for an automated classification system for preposition senses. As PNCs form the basic rationale for the current investigation, we are only considering prepositions that occur in PPs and PNCs in German. We thus systematically exclude prepositions that do not take an NP complement, postpositions, and complex prepositions. Thus, the sense annotation for prepositions currently comprises the following 22 simple prepositions in German: (2) an, auf, bei, dank, durch, für, gegen, gemäß, hinter, in, mit, mittels, nach, neben, ohne, seit, über, um, unter, vor, während, wegen As empirical base of the analysis, we use a Swiss German newspaper corpus, which contains about 230 million tokens (Neue Zürcher Zeitung 19931999). The remaining paper is structured as follows: Section 2 is devoted to the characteristics of the annotation schema. In section 3, we present an analysis of the schema in terms of inter-annotator
منابع مشابه
The annotation of preposition senses in German
It seems to be common wisdom that prepositions (especially simple prepositions) are highly polysemous. Preposition senses seem to be well explored, as there exists plenty of literature on prepositions and their interpretations. On a closer inspection, however, it turns out that we are far away from a well-structured understanding of preposition senses. There is mutual consent about simple prepo...
متن کاملA Logistic Regression Model of Determiner Omission in PPs
The realization of singular count nouns without an accompanying determiner inside a PP (determinerless PP, bare PP, Preposition-Noun Combination) has recently attracted some interest in computational linguistics. Yet, the relevant factors for determiner omission remain unclear, and conditions for determiner omission vary from language to language. We present a logistic regression model of deter...
متن کاملDisambiguation of the Semantics of German Prepositions: a Case Study
In this paper, we describe our experiments in preposition disambiguation based on a – compared to a previous study – revised annotation scheme and new features derived from a matrix factorization approach as used in the field of distributional semantics. We report on the annotation and Maximum Entropy modelling of the word senses of two German prepositions, mit (‘with’) and auf (‘on’). 500 occu...
متن کاملTowards standardized lexical semantic corpus annotation: Components of the Semantic Annotation Framework, SemAF
Spatial annotations form part of the Semantic Annotation Framework (SemAF). The current development SemAF-Space (ISO 24617-7) provides a formal specification but does not provide annotation guidelines. In my talk I will compare this approach with the approach developed for the annotation of preposition senses in Müller et al. (2011), where the annotation guidelines form the annotation specifica...
متن کاملA Rank-based Distance Measure to Detect Polysemy and to Determine Salient Vector-Space Features for German Prepositions
This paper addresses vector space models of prepositions, a notoriously ambiguous word class. We propose a rank-based distance measure to explore the vector-spatial properties of the ambiguous objects, focusing on two research tasks: (i) to distinguish polysemous from monosemous prepositions in vector space; and (ii) to determine salient vector-space features for a classification of preposition...
متن کامل